Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression.We study the techniques to process such queries on a distributed graph of data.While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be arbitrarily distributed, or non-localized.We compare query processing strategies for this setting analytically and empirically, using biomedical data and meaningful queries. We isolate query-dependent cost factors and present a method to choose between strategies, using new query cost estimation techniques.
展开▼